基于变压器的模型在多个领域和任务上显示了它们的有效性。自我注意力允许将所有序列元素的信息结合到上下文感知表示形式中。但是,全球和本地信息必须主要存储在相同的元素表示中。此外,输入序列的长度受到自我注意的二次计算复杂性的限制。在这项工作中,我们提出并研究了一个记忆启动的片段级循环变压器(复发记忆变压器)。内存允许借助复发的帮助存储和处理本地和全局信息,并可以在长序列的段之间传递信息。我们通过将特殊的内存令牌添加到输入或输出序列中,实现了一个内存机制,无需更改变压器模型。然后,对变压器进行了训练,以控制内存操作和序列表示处理。实验的结果表明,我们的模型与Transformer-XL在语言建模上的较小内存大小上的表现相同,并在需要更长序列处理的任务方面胜过它。我们证明,将内存令牌添加到TR-XL可以提高IT性能。这使得反复的内存变压器成为需要学习长期依赖性和内存处理中的通用性(例如算法任务和推理)的应用程序的有前途的体系结构。
translated by 谷歌翻译
在许多顺序任务中,模型需要记住遥远过去的相关事件,以做出正确的预测。不幸的是,基于梯度的训练的直接应用需要为序列的每个元素存储中间计算。如果一个序列由数千甚至数百万个元素组成,则需要过大的计算记忆,因此,学习非常长期的依赖性不可行。但是,通常只能考虑到时间上的局部信息来预测大多数序列元素。另一方面,仅在局部信息的情况下,受长期依赖性影响的预测稀疏,其特征是高不确定性。我们提出了一种新的培训方法,该方法允许一次学习长期依赖性,而无需一次通过整个序列进行反向传播梯度。该方法可以潜在地应用于任何基于梯度的序列学习。复发体系结构的磁化实现更好或与基线相媲美,同时需要大大减少计算内存。
translated by 谷歌翻译
我们提出了一种基于体积的基于网格的算法,用于参数化胎盘到扁平模板,以实现局部解剖结构和功能的有效可视化。 MRI显示潜在作为研究工具,因为它提供与胎盘功能直接相关的信号。然而,由于胎盘体内形状的弯曲和高度变化,解释和可视化这些图像是困难的。我们通过绘制胎盘来解决解释挑战,以便它类似于熟悉的离体形状。我们将参数化作为优化问题,用于将体积网格表示的胎盘形状映射到扁平模板。我们采用对称的Dirichlet Energy来控制整个体积的局部变形。在梯度下降优化期间,映射中的局部注射是由约束的线路搜索强制执行的。我们使用从大胆的MRI图像中提取的111个胎盘形状的研究研究验证了我们的方法。我们的映射在匹配模板时实现了子体素准确性,同时保持整个音量的低失真。我们展示了胎盘的扁平化程度如何改善解剖学和功能的可视化。我们的代码在https://github.com/mabulnaga/plentaa-flatteny自由提供。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.
translated by 谷歌翻译
and widely used information measurement metric, particularly popularized for SSVEP- based Brain-Computer (BCI) interfaces. By combining speed and accuracy into a single-valued parameter, this metric aids in the evaluation and comparison of various target identification algorithms across different BCI communities. To accurately depict performance and inspire an end-to-end design for futuristic BCI designs, a more thorough examination and definition of ITR is therefore required. We model the symbiotic communication medium, hosted by the retinogeniculate visual pathway, as a discrete memoryless channel and use the modified capacity expressions to redefine the ITR. We use graph theory to characterize the relationship between the asymmetry of the transition statistics and the ITR gain with the new definition, leading to potential bounds on data rate performance. On two well-known SSVEP datasets, we compared two cutting-edge target identification methods. Results indicate that the induced DM channel asymmetry has a greater impact on the actual perceived ITR than the change in input distribution. Moreover, it is demonstrated that the ITR gain under the new definition is inversely correlated with the asymmetry in the channel transition statistics. Individual input customizations are further shown to yield perceived ITR performance improvements. An algorithm is proposed to find the capacity of binary classification and further discussions are given to extend such results to ensemble techniques.We anticipate that the results of our study will contribute to the characterization of the highly dynamic BCI channel capacities, performance thresholds, and improved BCI stimulus designs for a tighter symbiosis between the human brain and computer systems while enhancing the efficiency of the underlying communication resources.
translated by 谷歌翻译
A step-search sequential quadratic programming method is proposed for solving nonlinear equality constrained stochastic optimization problems. It is assumed that constraint function values and derivatives are available, but only stochastic approximations of the objective function and its associated derivatives can be computed via inexact probabilistic zeroth- and first-order oracles. Under reasonable assumptions, a high-probability bound on the iteration complexity of the algorithm to approximate first-order stationarity is derived. Numerical results on standard nonlinear optimization test problems illustrate the advantages and limitations of our proposed method.
translated by 谷歌翻译
Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based Neural Architecture Search (NAS) method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. To this end, we introduce the Pseudo-Inverted Bottleneck conv block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower GMACs and parameter count, GradCAM comparisons show that our network is able to better detect distinctive features of target objects compared to DARTS.
translated by 谷歌翻译